Improved record linkage for encrypted identifying data

نویسندگان

  • Chaoyi Pang
  • David Hansen
چکیده

The health data integration project at the E-Health Research Centre is researching ways of improving the integration of health and health related data while maintaining the privacy and security of the data. One such method is to improve the mechanisms of matching patients across databases when the identifying information must not be revealed, even during the linkage step. Background: With health related data spread between many administrative and clinical databases the ability to bring the data together dynamically is important. This could be to support clinical based decision making, administrative reporting or for clinical research based access to data. Objectives: There are already mechanisms published for blind folded record linkage. A mechanism for further strengthening the security and privacy of these algorithms is to encrypt the identifying data, such as name, data of birth, before performing the linkage step. However, due to the nature of encryption algorithms, encrypted data can only be matched exactly, limiting the ability to allow for errors in the data. This work presents a mechanism to allow matching of encrypted data when there may be errors in the data. Methods: A public reference table which is common to both data custodians is used. Each value in the original data is compared to data in the public reference table using an edit distance function. Names from the reference table which are within a given distance of the original data are sent to the linker. The data from the two data custodians are then compared to decide the likelihood of two records being a match. Results: The method described in this paper performs better than other methods which support matching of encrypted data, such as exact matching or matching using soundex. Discussion and Conclusion: The method described in this paper can be used to improve the level of record matching in tools where access to identifying data is prohibited. This method is currently being added to the HDI software tool as another mechanism of matching records between databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality

BACKGROUND Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A s...

متن کامل

Privacy-preserving record linkage using Bloom filters

BACKGROUND Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns. METHOD...

متن کامل

Private Key based query on encrypted data

Nowadays, users of information systems have inclination to use a central server to decrease data transferring and maintenance costs. Since such a system is not so trustworthy, users' data usually upkeeps encrypted. However, encryption is not a nostrum for security problems and cannot guarantee the data security. In other words, there are some techniques that can endanger security of encrypted d...

متن کامل

An Empirical Comparison of Approaches to Approximate String Matching in Private Record Linkage

Due to the frequency of spelling and typographical errors in practical applications, record linkage algorithms have to use string similarity functions. In many legal contexts, identifiers such as names have to be encrypted before a record linkage can be attempted. Therefore, algorithms for computing string similarity functions with encrypted identifiers are essential for approximating string ma...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006